Systematic Biology — Latest Matching Preprints

1

An Extended Clade Framework for Annotated Trees in the Context of Phylogeography and Transmission Tree Inference

Berling, L.; Colijn, C.

2026-04-27 bioinformatics 10.64898/2026.04.23.720428 medRxiv

Top 0.1%

78.6%

Show abstract

Bayesian phylogenetic inference produces large samples from a posterior distribution over phylogenetic trees that represents uncertainty in both tree topology and associated variables. Such a collection of trees is hard to interpret and it is common practice to summarize such samples into a single representative tree. Methods for constructing representative trees have largely been restricted to plain tree topologies, encoding only relationships among taxa. Inference with more sophisticated models produce annotated tree objects. These have additional information representing nodes locations in the case of phylogeography, host information when inferring transmission trees, or sampled ancestor status when incorporating fossil information. Nevertheless, these annotated representations are reduced to a single representative tree, typically using methods developed for plain tree topologies and without accounting for the resulting methodological mismatch. Here, we introduce the concept of an extended clade and investigate an extension of the conditional clade distribution (CCD) model. Through motivating examples and case studies in discrete trait phylogeography and transmission tree reconstruction, we demonstrate limitations of standard summary tree approaches and show how these can be addressed using an extended CCD framework that explicitly incorporates the annotated tree structure.

2

A discrete character evolution model for phylogenetic comparative biology with {Gamma}-distributed rate heterogeneity among branches of the tree

Revell, L. J.; Harmon, L. J.

2024-05-30 evolutionary biology 10.1101/2024.05.25.595896 medRxiv

Top 0.1%

77.1%

Show abstract

Phylogenetic comparative methods are now widely used to measure trait evolution on the tree of life. Often these methods involve fitting an explicit model of character evolution to trait data and then comparing the explanatory power of this model to alternative scenarios. In this article, we present a new model for discrete trait evolution in which the rate of character change in the tree varies from edge (i.e., "branch") to edge of the phylogeny according to a discretized {Gamma} distribution. When the edge-wise rates of evolution are, in fact, {Gamma}-distributed, we show via simulation that this model can be used to reliably estimate the shape parameter () of the distribution of rate variation among edges. We also describe how our model can be employed in ancestral state reconstruction, and demonstrate via simulation how doing so will tend to increase the accuracy of our estimated states when the generating edge rates are {Gamma}-distributed. We discuss how marginal edge rates are estimated under the model, and apply our method to a real dataset of digit number in squamate reptiles, modified from Brandley et al. (2008).

3

Macroevolutionary analysis of discrete traits with rate heterogeneity

Grundler, M.; Rabosky, D. L.

2020-01-08 evolutionary biology 10.1101/2020.01.07.897777 medRxiv

Top 0.1%

76.7%

Show abstract

AO_SCPLOWBSTRACTC_SCPLOWOrganismal traits show dramatic variation in phylogenetic patterns of origin and loss across the Tree of Life. Understanding the causes and consequences of this variation depends critically on accounting for heterogeneity in rates of trait evolution among lineages. Here, we describe a method for modeling among-lineage evolutionary rate heterogeneity in a trait with two discrete states. The method assumes that the present-day distribution of a binary trait is shaped by a mixture of stochastic processes in which the rate of evolution varies among lineages in a phylogeny. The number and location of rate changes, which we refer to as rate-shift events, are inferred automatically from the data. Simulations reveal that the method accurately reconstructs rates of trait evolution and ancestral character states even when simulated data violate model assumptions. We apply the method to an empirical dataset of mimetic coloration in snakes and find elevated rates of trait evolution in two clades of harmless snakes that are broadly sympatric with dangerously venomous New World coral snakes, recapitulating an earlier analysis of the same dataset. Although the method performed well on many simulated data sets, we caution that overall power for inferring heterogeneous dynamics of single binary traits is low.

4

Putting the F in FBD analyses: tree constraints or morphological data ?

Barido-Sottani, J.; Pohle, A.; De Baets, K.; Murdock, D.; Warnock, R.

2022-07-18 evolutionary biology 10.1101/2022.07.07.499091 medRxiv

Top 0.1%

74.6%

Show abstract

1The fossilized birth-death (FBD) process provides an ideal model for inferring phylogenies from both extant and fossil taxa. Using this approach, fossils (with or without character data) are directly considered as part of the tree. This leads to a statistically coherent prior on divergence times, where the variance associated with node ages reflects uncertainty in the placement of fossil taxa in the phylogeny. Since fossils are typically not associated with molecular sequences, additional information is required to place fossils in the tree. Previously, this information has been provided in two different forms: using topological constraints, where the user specifies monophyletic clades based on established taxonomy, or so-called total-evidence analyses, which use a morphological data matrix with data for both fossil and extant specimens in addition to the molecular alignment. In this work, we use simulations to evaluate these different approaches to handling fossil placement in FBD analyses, both in ideal conditions and in datasets including uncertainty or even errors. We also explore how rate variation in fossil recovery or diversification rates impacts these approaches. We find that the extant topology is well recovered under all methods of fossil placement. Divergence times are similarly well recovered across all methods, with the exception of constraints which contain errors. These results are consistent with expectations: in FBD inferences, divergence times are mostly informed by fossil ages, so variations in the position of fossils strongly impact these estimates. On the other hand, the placement of extant taxa in the phylogeny is driven primarily by the molecular alignment. We see similar patterns in datasets which include rate variation, however one notable difference is that relative errors in extant divergence times increase when more variation is included in the dataset, for all approaches using topological constraints, and particularly for constraints with errors. Finally, we show that trees recovered under the FBD model are more accurate than those estimated using non-FBD (i.e., non-time calibrated) inference. This result holds even with the use of erroneous fossil constraints and model misspecification under the FBD. Overall, our results underscore the importance of core taxonomic research, including morphological data collection and species descriptions, irrespective of the approach to handling phylogenetic uncertainty using the FBD process.

5

Quantitative Models for Distinguishing Punctuated and Continuous-Time Models of Character Evolution and Their Implications for Macroevolutionary Theory

Wright, A. M.; Wagner, P. J.

2024-04-13 paleontology 10.1101/2024.04.09.588788 medRxiv

Top 0.1%

74.0%

Show abstract

The recent proliferation of quantitative models for assessing anatomical character evolution all assume that character change happens continuously through time. However, punctuated equilibrium model posits that character change should be coincide with cladogenetic events, and thus should be tied to origination rates. Rates of cladogenesis are important to quantitative phylogenetics, but typically only for establishing prior probabilities of phylogenetic topologies. Here, we modify existing character likelihood models to use the local cladogenesis rates from Bayesian analyses to generate the amounts of character change over time dependent on origination rates, as expected under the punctuated equilibrium model. In the case of strophomenoid brachiopods strop from the Ordovician, we find that Bayesian analyses strongly favor punctuated models over continuous-time models, with elevated rates of cladogenesis early in the clades history inducing frequencies of change despite constant rates of change per speciation event. This corroborates prior work proposing that the early burst in strophomenoid disparity reflects simply elevated speciation rates, which in turn has implications for seemingly unrelated macroevolutionary theory about whether early bursts reflect shifts in intrinsic constraints or empty ecospace. Future development of punctuated character evolution models should account for the full durations of species, which will provide a test of continuous change rates. Ultimately, continuous change vs. punctuated change should become part of phylogenetic paleobiology in the same way that other tests of character evolution currently are. Non-technical SummaryPunctuated Equilibrium predicts a distribution of anatomical change that is fundamentally different from the models used in studies of relationships among species. We present a model to assess relationships that assumes punctuated change. We apply this model to a dataset of strophomenoid brachiopods to demonstrate that a model of punctuated change fits better than a model of continuous-time ("phyletic gradualism") change in this group. Notably, because the punctuated model posits elevated speciation rates early in the strophomenoid history, the model also posits elevated rates of change among the early strophomenoids relative to later ones. This corroborates notions for what causes bursts of anatomical evolution rooted in ecological theory rather than evolutionary developmental theory. More basically, it emphasizes that paleontologists should consider both punctuated and continuous-time models when assessing relationships and other aspects of macroevolutionary theory.

6

A covarion model for phylogenetic estimation using discrete morphological datasets

Khakurel, B.; Hoehna, S.

2026-02-20 evolutionary biology 10.1101/2025.06.20.660793 medRxiv

Top 0.1%

73.1%

Show abstract

AbstractThe rate of evolution of a single morphological character is not homogeneous across the phylogeny and this rate heterogeneity varies between morphological characters. However, traditional models of morphological character evolution often assume that all characters evolve according to a time-homogeneous Markov process, which applies uniformly across the entire phylogeny. While models incorporating amongcharacter rate variation alleviate the assumption of the same rate for all characters, they still fail to address lineage-specific rate variation for individual characters. The covarion model, originally developed for molecular data to model the invariability of some sites for parts of the phylogeny, provides a promising framework for addressing this issue in morphological phylogenetics. In this study, we extend the covarion model in RevBayes to morphological character evolution, which we call the covariomorph model, and apply it to a diverse range of morphological datasets. Our covariomorph model utilizes multiple rate categories derived from a discretized probability distribution, which scales rate matrices accordingly. Characters are allowed to evolve within any of these rate categories, with the possibility of switching between rate categories during the evolutionary process. We verified our implementation of the covariomorph model with the help of simulations. Additionally, we examined 164 empirical datasets, finding patterns of rate heterogeneity compatible with covarion-like dynamics in approximately half of them. Upon further examination of two focal datasets that exhibited covarion-like rate variation, we found that the covariomorph model provides a more nuanced approach to incorporate rate variation across lineages, significantly affecting the resulting tree topology and branch lengths compared to traditional models. The observed sensitivity of branch lengths to model choice underscores potential implications of this approach for divergence time estimation and evolutionary rate calculations. By accounting for lineageand character-specific rate shifts, the covariomorph model offers a robust framework to improve the accuracy of morphological phylogenetic inference.

7

The Fossilized Birth Death Process with heterogeneous diversification rates unravels the link between diversification and specialisation to a carnivorous diet in Nimravidae (Carnivoraformes)

Chabrol, N.; Morlon, H.; Barido-Sottani, J.

2025-07-18 evolutionary biology 10.1101/2025.07.15.664897 medRxiv

Top 0.1%

70.5%

Show abstract

Bayesian phylogenetic inference uses more and more complex diversification models as tree priors to test new macroevolutionary hypotheses. However, those models are usually developed in a neontological framework, despite the increasing number of datasets covering both extant and fossil taxa, as well as the fact that many clades are entirely extinct. In this paper, we develop the F-ClaDS model, a Fossilized-Birth-Death (FBD) version of the cladogenetic diversification rate shift (ClaDS) model, in BEAST2. ClaDS estimates partially inherited branch-specific rates from a phylogeny, providing a nuanced and detailed perspective of the variations in diversification across the tree. Our extension allows the integration of fossil samples directly into the phylogeny. We apply our new implementation to a dataset of 36 Nimravidae, a fully extinct carnivoraform clade that spanned from the Early Eocene to the Late Miocene, which species had different degrees of specialisation to a carnivorous diet. We show that using different tree priors does not affect substantially the topology of the inferred trees, but affects the ages of nodes and tips, as well as branch-lengths. F-ClaDS also recovers more species as sampled ancestors than the homogeneous FBD model. The branches with the highest speciation and extinction rates are those corresponding to the hypercarnivorous clades (Hoplophoneus and the barbourofelins), supporting the view that specialization to a hypercarnivorous diet can spur speciation, but also increase extinction risk, especially during times of global ecosystem change, potentially due to a high position in the trophic chain.

8

State Space Misspecification in Morphological Phylogenetics: A Pitfall for Models and Parsimony Alike

Huang, E.

2025-04-26 evolutionary biology 10.1101/2025.04.22.650124 medRxiv

Top 0.1%

70.4%

Show abstract

Phylogenetic analysis relies on two fundamental levels of biological information: genotype and phenotype. Molecular data benefit from operating within a well-defined, finite state space (e.g., nucleotide alphabets), whereas morphological data present inherent challenges due to frequently ambiguous character states and variable state counts. In this study, I use simulated data to examine how state space misspecification (SSM), defined as the mismatch between the assumed and true state space, affects phylogenetic reconstruction. Results show that SSM generally reduces topological accuracy, with the extent of its impact depending on mutation rate, state space disparity, and the proportion of affected characters. Counterintuitively, under conditions typical of empirical morphological datasets (high proportions of binary characters and elevated mutation rates), SSM can improve topological precision. This creates a paradox where an incorrect model outperforms a correct one, though at the cost of distorted branch lengths. Importantly, the effects of SSM extend beyond model-based approaches. I demonstrate, through an extension of the no common mechanism (NCM) model, that standard maximum parsimony is consistent with the assumption that characters evolved under an SSM model--a largely overlooked feature. To address this, I propose a state-space-aware weighting scheme that accounts for variation in character state space. I also discuss additional strategies for mitigating SSM, including model adjustments and reducing reliance on oversimplified binary coding. This work underscores the need to explicitly address state space uncertainty in morphological phylogenetics. As morphology remains crucial for reconstructing deep-time lineages and integrating fossils, accounting for SSM is essential to improving the reliability of evolutionary trees.

9

Macroevolutionary analysis of discrete character evolution using parsimony-informed likelihood

Grundler, M.; Rabosky, D. L.

2020-01-08 evolutionary biology 10.1101/2020.01.07.897603 medRxiv

Top 0.1%

70.1%

Show abstract

AO_SCPLOWBSTRACTC_SCPLOWRates of character evolution in macroevolutionary datasets are typically estimated by maximizing the likelihood function of a continuous-time Markov chain (CTMC) model of character evolution over all possible histories of character state change, a technique known as maximum average likelihood. An alternative approach is to estimate ancestral character states independently of rates using parsimony and to then condition likelihood-based estimates of transition rates on the resulting ancestor-descendant reconstructions. We use maximum parsimony reconstructions of possible pathways of evolution to implement this alternative approach for single-character datasets simulated on empirical phylogenies using a two-state CTMC. We find that transition rates estimated using parsimonious ancestor-descendant reconstructions have lower mean squared error than transition rates estimated by maximum average likelihood. Although we use a binary state character for exposition, the approach remains valid for an arbitrary number of states. Finally, we show how this method can be used to rapidly and easily detect phylogenetic variation in tempo and mode of character evolution with two empirical examples from squamates. These results highlight the mutually informative roles of parsimony and likelihood when testing hypotheses of character evolution in macroevolution.

10

Bayesian Least-Squares Supertrees (BLeSS): flexible inference of large time-calibrated phylogenies

Cerny, D.; Slater, G. J.

2024-12-05 evolutionary biology 10.1101/2024.11.29.625936 medRxiv

Top 0.1%

69.5%

Show abstract

AO_SCPLOWBSTRACTC_SCPLOWTime-calibrated phylogenies are key to macroevolutionary hypothesis testing and parameter inference, but their estimation is difficult when the number of tips is large. Despite its attractive properties, the joint Bayesian inference of topology and divergence times remains computationally prohibitive for large supermatrices. Historically, supertrees represented a popular alternative to supermatrix-based phylogenetic methods, but most of the existing supertree techniques do not accommodate branch lengths or topological uncertainty, rendering them unfit to supply input for modern comparative methods. Here, we present Bayesian Least-Squares Supertrees (BLeSS), a new approach that takes a profile of time trees with partially overlapping leaf sets as its input, and returns the joint posterior distribution of supertree topologies and divergence times as its output. Building upon the earlier exponential error model and average consensus techniques, BLeSS transforms the profile into path-length distance matrices, computes their arithmetic average, and uses MCMC to sample time-calibrated supertrees according to their least-squares fit to the average distance matrix. We provide a fast, flexible, and validated implementation of BLeSS in the program RevBayes, and test its performance using a comprehensive set of simulations. We show that the method performs well across a wide range of conditions, including variation in missing data treatment and the steepness of the error function. Finally, we apply BLeSS to an empirical dataset comprising 33 time trees for 260 species of carnivorans, illustrating its ability to recover well-supported clades and plausible node ages, and discuss how the method can best be used in practice, outlining possible extensions and performance boosts.

11

Phylogenetic inference from an incomplete fossil record

Hohmann, N.; Warnock, R. C. M.; Jarochowska, E.

2026-06-28 paleontology 10.64898/2026.06.24.734220 medRxiv

Top 0.1%

66.8%

Show abstract

Fossil data is crucial to construct phylogenetic time trees, which serve as the basis to test a wide range of evolutionary hypotheses. While the fossil record is known to be incomplete, modern stratigraphy provides predictions of the structure of the fossil record as expressed by gap location and duration. Advances in phylogenetic model development allow us to propagate this information into Bayesian phylogenetic inference in the form of priors on time-variable fossil sampling. However, the impact and role of stratigraphic architectures on time tree inference has so far remained unexplored. We introduce a novel simulation framework that combines realistic stratigraphic forward models with phylogenetic simulations. Using this framework, we examine (1) how stratigraphically plausible model violations of fossil sampling due to gaps affect total-evidence inference under the fossilized birth-death model and (2) if stratigraphic knowledge on gap duration and timing improves inference when incorporated in priors on fossil sampling. We find that total-evidence analysis is robust to stratigraphically plausible distribution of gaps in disparate stratigraphic architectures, with results being instead dominated by the number of morphological characters. Surprisingly, incorporating information on prominent gaps in the stratigraphic record does not improve phylogenetic inference. Our results suggest that phylogenetic inference is robust to model violations introduced by stratigraphic gaps over short timescales, with results being dominated by a priori known data availability constraints such as morphological character matrix size. This research establishes the foundations for joint modeling of phylogenetic and stratigraphic processes and narrows the knowledge gap between paleontology, stratigraphy, and neontology.

12

The Implications of Over-Estimating Gene Tree Discordance on a Rapid-Radiation Species Tree (Blattodea: Blaberidae)

Evangelista, D. A.; Gilchrist, M. A.; Legendre, F.; O'Meara, B.

2019-07-28 evolutionary biology 10.1101/717660 medRxiv

Top 0.1%

66.4%

Show abstract

Patterns of discordance between gene trees and the species trees they reside in are crucial to the debate over the superiority of coalescent or concatenation approaches to tree inference. However, errors in estimating gene tree topologies obfuscate the issue by making gene trees appear erroneously discordant with the species tree. We thus test the prevalence of discordance between gene trees and their species tree using an empirical dataset for a clade with a rapid radiation (Blaberidae). We find that one model of codon evolution (FMutSel0) prefers gene trees that are less discordant, while another (SelAC) shows no such preference. We compare the species trees resulting from the selected sets of gene trees on the basis of internal consistency, predictive ability, and congruence with independent data. The species tree resulting from gene trees those chosen by FMutSel0, a set with low discordance, is the most robust and biologically plausible. Thus, we conclude that the results from FMutSel0 are better supported: simple models (i.e., GTR and ECM) infer trees with erroneously high levels of gene tree discordance. Furthermore, the amount of discordance in the set of gene trees has a large effect on the downstream phylogeny. Thus, decreasing gene tree error by lessening erroneous discordance can result in higher quality species trees. These results allow us to support relationships among blaberid cockroaches that were previously in flux as they now demonstrate molecular and morphological congruence.

13

Identifying Evolutionary Relatedness Effects on Diversification from Phylogenies using Neural Networks

Qin, T.; van Benthem, K. J.; Valente, L.; Etienne, R.

2026-01-13 evolutionary biology 10.64898/2026.01.12.699140 medRxiv

Top 0.1%

66.0%

Show abstract

Reconstructing the forces that shaped macroevolutionary histories from extant phylogenies is fundamentally challenging: richly parameterized diversification models are often only weakly identifiable; different evolutionary mechanisms can yield nearly indistinguishable tree shapes. Here we use a model with evolutionary relatedness dependence to evaluate how much information about such forces can be recovered from simulated trees. We train graph neural networks and long short-term memory classifiers to distinguish three scenarios of feedback of diversity on diversification: effect of phylogenetic diversity (total branch length), evolutionary distinctiveness (average phylogenetic distance of a species to all other species in a clade), and nearest-neighbor distance (phylogenetic distance to the mostly closely related species). We also train a suite of regression networks to estimate the underlying diversification parameters. We then analyze classification performance, calibration of predicted class probabilities, regression errors, and their dependence on tree size and on the strength and sign of richness and relatedness effects. Across network architectures and complexity levels, scenario classification is only moderately accurate and strongly asymmetric as revealed by the confusion matrix. Trees generated under an effect of nearest-neighbor distance on diversification tend to be correctly classified, whereas those with an effect of evolutionary distinctiveness are frequently misclassified. Regression networks systematically shrink predictions toward the empirical mean, especially for complex models, suggesting broad regions of parameter space with low identifiability. Strong global dependence of diversification rates on diversity further erodes recoverability by driving large variations in tree size that mask the subtler signatures of related-ness effects. In contrast, sufficiently strong speciation-relatedness effects can carve out narrow regions of parameter space in which scenarios and parameters become practically recoverable. Together, our results provide a map of when neural networks can and cannot infer diversification mechanisms from extant trees under our evolutionary relatedness dependence model, and they underscore the need for additional data or constraints when using flexible diversification models for macroevolutionary inference.

14

Estimating Bayesian phylogenetic information content using geodesic distances

Milkey, A.; Lewis, P. O.

2026-04-01 evolutionary biology 10.64898/2026.03.31.715656 medRxiv

Top 0.1%

65.7%

Show abstract

AO_SCPLOWBSTRACTC_SCPLOWA new Bayesian measure of phylogenetic information content is introduced based on geodesic distances in treespace. The measure is based on the relative variance of phylogenetic trees sampled from the posterior distribution compared to the prior distribution. This ratio is expected to equal 1 if there is no information in the data about phylogeny and 0 if there is complete information. Trees can be scaled to have the same mean tree length to avoid dominance by edge length information and focus on topological information. The method scales well, requiring only that a valid sample can be obtained from both prior and posterior distributions. We show how dissonance (information conflict) among data sets can also be estimated. Both simulated and empirical examples are provided to illustrate that the new approach produces sensible and intuitive results.

15

New approaches to detecting and characterizing introgression in large species trees

Mishra, S.; Pomar-Pallares, L.; Lanfear, R.; Hahn, M. W.

2026-06-01 evolutionary biology 10.64898/2026.05.30.728990 medRxiv

Top 0.1%

65.7%

Show abstract

Many current phylogeny-based methods to detect introgression use samples of species-quartets to detect asymmetries in gene tree frequencies. While this has proven to be an accurate and robust approach, applying it to larger species trees often means having to test dozens to hundreds of quartets across a tree. Furthermore, any single introgression event can have effects on multiple quartets--with no principled way to determine the number of unique events from a set of quartets--and the direction of introgression cannot always be determined from quartet comparisons alone. Here, we present a new approach to detecting introgression using the frequency with which more distantly related clades are attached to one another among a set of gene trees. Testing for introgression between pairs of branches is straightforward using these discordant attachment frequencies. We further show that the direction of introgression can be inferred between any pair of branches separated by at least two internal branches of the species tree, and that theoretical expectations of gene tree frequencies under introgression can be used to accurately determine the number of independent times genes have been exchanged. Application of these methods to data from cichlids and Drosophila demonstrate the power of the new approaches. The DAFT software package is available from: https://github.com/smishra677/DAFT/

16

What is the best method for estimating ancestral states from discrete characters?

Keating, J. N.

2023-09-01 evolutionary biology 10.1101/2023.08.31.555762 medRxiv

Top 0.1%

65.4%

Show abstract

Ancestral state estimation is a formal phylogenetic method for inferring the nature of ancestors and performing tests of character evolution. As such, it is among the most important tools available to evolutionary biologists. However, there are a profusion of methods available, the accuracy of which remains unclear. Here I use a simulation approach to test between parsimony and likelihood methods for estimating ancestral states from discrete binary characters. I simulate 500 characters using 15 different Markov generating models, a range of tree sizes (8-256 tips) and three topologies representing end members of tree symmetry and branch length heterogeneity. Simulated tip states were subjected to ancestral state estimation under the Equal Rates (ER) and All-Rates-Different (ARD) models, as well as under parsimony assuming accelerated transformations (ACCTRAN). The results demonstrate that both parsimony and likelihood approaches obtain high accuracy applied to trees with more tips. Parsimony performs poorly when trees contain long branches, whereas the ER model performs well across simulations and is reasonably robust to model violation. The ER model frequently outperforms the ARD model, even when data are simulated using unequal rates. Furthermore, the ER model exhibits less transition rate error when compared to ER models. These results suggest that ARD models may be overparameterized when character data is limited. Surprisingly, the difference in likelihood-based information criteria between models was found to be a poor predictor of difference in model error; better fitting models are not necessarily more accurate. However, there is a strong correlation between model uncertainty and model error; likelihood models with more certain ancestral state estimates are typically more accurate. Using empirical morphological datasets, I demonstrate that applying different methods often results in substantively different ancestral state estimates. The results of the simulation study highlight the importance of incorporating fossils in ancestral state estimation. Fossils increase the total number of tips, break long branches and are closer to internal nodes, thereby lowering average branch length and overall branch length heterogeneity of trees. These factors will all contribute to increasing the accuracy of ancestral state estimates, irrespective of the method used.

17

On the utility of Deep Learning for model classification and parameter estimation on complex diversification scenarios.

Pena, P. G.; Iglesias, G.; Talavera, E.; Meseguer, A. S.; Sanmartin, I.

2025-08-28 evolutionary biology Community evaluation 10.1101/2025.08.27.671290 medRxiv

Top 0.1%

62.5%

Show abstract

Birth-Death models applied to dated phylogenies are a useful tool to study past diversification dynamics. Parameters in these stochastic models are typically inferred using likelihood-based methods such as Maximum Likelihood Estimation (MLE) or Bayesian Inference. However, these approaches exhibit computational tractability issues in the case of models of moderate to high complexity. One approach to increase model complexity while remaining computationally tractable in the context of birth-death modelling is machine learning. So far, these techniques have been explored in the context of serially-sampled phylogenies (phylodynamics) and trait-dependent birth-death models. Here, we explored the power of Convolutional Neural Networks (CNNs), a type of Deep Learning (DL) method, to solve classification and regression (parameter estimation) tasks under constant-rate and time-homogeneous, rate-variable birth-death models. In particular, we compared six diversification scenarios: Constant Birth-Death, High-Extinction, Mass-Extinction, Diversity-Dependent, Stasis-and-Radiate, and Waxing-and-Waning. We simulated 10, 000 phylogenetic trees under each diversification scenario, which were encoded using a vectorization procedure that captures the topology and branch length information. The encoded trees were used to train or test a set of CNNs models that were designed to tailor three empirical case studies differing in the number of tips. We compared CNNs performance with MLE inference. Our results show that CNNs exhibited classification accuracy levels of 93-78%, whereas maximum likelihood estimation achieved levels of 74-70%. The most difficult scenarios to predict for the CNNs were the high-extinction and mass-extinction scenarios, which were often misidentified as one another. For the regression tasks, mean average errors were comparable between CNNs models and MLE inference, and they also coincided in their difficulty estimating ratio parameters such as mass extinction survival and turnover. Finally, we applied our CNNs to three empirical studies (eucalypts, conifers and cetaceans) and discussed potential shortcomings and future avenues for improvement in the application of deep-learning birth-death modelling approaches.

18

EMPIRE: The Ellipse Model for Phylogenetic Inference of Range Evolution

Swiston, S. K.; McHugh, S. W.; Landis, M. J.

2026-04-24 evolutionary biology 10.64898/2026.04.23.720387 medRxiv

Top 0.1%

61.3%

Show abstract

AO_SCPLOWBSTRACTC_SCPLOWMany phylogenetic models of historical biogeography exist for describing how lineages move and evolve over time. Here, we present the Ellipse Model for Phylogenetic Inference of Range Evolution (O_SCPLOWEMPIREC_SCPLOW), which models the movement and splitting of species range ellipses in continuous space, summarizing important attributes of each range, such as its position, size, and orientation. The framework allows us to reconstruct ancestral range ellipses, investigate rates governing important processes like movement, expansion, and elongation, and examine the spatial context of speciation, including asymmetric range inheritance at cladogenesis. We apply O_SCPLOWEMPIREC_SCPLOW to the Australian Sphenomorphinae, a group of skinks whose diversification has coincided with substantial climatic change over the past ~36 million years. We find that speciation events are positively associated with aridification, while daughter lineages post-speciation do not tend to show evidence of ecological partitioning.

19

How much information is there for inferring species trees?

Milkey, A.; Chen, J.; Lewis, P. O.

2026-04-02 evolutionary biology 10.64898/2026.04.01.715836 medRxiv

Top 0.1%

61.0%

Show abstract

AO_SCPLOWBSTRACTC_SCPLOWAs modern phylogenomics datasets become increasingly large, it is useful to develop recommendations for how to subsample datasets for best species tree inference. Here we apply a new measure of phylogenetic information content that estimates the reduction in tree space occupied by a posterior sample of inferred trees relative to a prior sample in order to assess the effects of gene tree parameters on species tree estimation. We find that, consistent with earlier studies, when data are informative, more data result in better species tree inference. However, when data are uninformative, subsampling a dataset to include only the most informative loci may produce a better species tree sample. We perform analyses on a variety of simulated and empirical datasets.

20

Inaccurate fossil placement does not compromise tip-dated divergence times

Koch, N. M.; Garwood, R. J.; Parry, L. A.

2022-08-26 evolutionary biology 10.1101/2022.08.25.505200 medRxiv

Top 0.1%

60.9%

Show abstract

Time-scaled phylogenies underpin the interrogation of evolutionary processes across deep timescales, as well as attempts to link these to Earths history. By inferring the placement of fossils and using their ages as temporal constraints, tip dating under the fossilised-birth death (FBD) process provides a coherent prior on divergence times. At the same time, it also links topological and temporal accuracy, as incorrectly placed fossil terminals should misinform divergence times. This could pose serious issues for obtaining accurate node ages, yet the interaction between topological and temporal error has not been thoroughly explored. We simulate phylogenies and associated morphological datasets using methodologies that incorporate evolution under selection, and are benchmarked against empirical datasets. We find that datasets of moderate sizes (300 characters) and realistic levels of missing data generally succeed in inferring the correct placement of fossils on a constrained extant backbone topology, and that true node ages are usually contained within Bayesian posterior distributions. While increased fossil sampling improves the accuracy of inferred ages, topological and temporal errors do not seem to be linked: analyses in which fossils resolve less accurately do not exhibit elevated errors in node age estimates. At the same time, divergence times are systematically biased, a pattern that stems from a mismatch between the FBD prior and the shape of our simulated trees. While these results are encouraging, suggesting even fossils with uncertain affinities can provide useful temporal information, they also emphasise that paleontological information cannot overturn discrepancies between model priors and the true diversification history.